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The continue searching for organization’s process improvement for reduce cost 
and increase efficiency is a big challenge for organizations nowadays. This pa- 
per is about to recognize the importance of process improvement focusing in 
the right human resource allocation. The research predict best optime human 
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resource allocation in the Superintendencia Nacional de Aduanas (SUNAT) in 
the chemical materials control area using a linear regression machine learning 
Keywords: algorithm. This model was validated with recollected data in the SUNAT’s con- 
trol locations, the results were compared with historical data to determine their 
efficiency obtained a mean square error 0.434 that is lower comparing to logistic 
regression and support vector machine algorithm. This research recommend the 
implementation of this model in all SUNAT’s controls locations in Pert. 
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1. INTRODUCTION 

Business organization has a continuos effort in improve their processes for reduce cost and resources 
to increase their revenue. One of the most important factors is optimize the human resource and just have the 
right quantity of workers in each processes or activity. This study is focused in determine the optimal human 
resource allocation in the chemical materials control area in Superintendencia Nacional de Aduanas (SUNAT). 
In the literature review is available many papers related to the importance of human resource allocation, many 
factors are consider for these aim like, quantity of workers or find a mix of skills that cover the job and the 
process’s requirements [1]-[5]. Also in this literature shows the most using method to choose the optimal 
resource allocation are statistical methods and machine learning (ML) algorithm [6]-[10]. Some of the most 
algorithm used to predict human resource allocation are neural network algorithm used for predict human 
resource allocation for container terminal [11]-[13]. Other algorithm used are the decision tree and linear 
regression both combined in a multilayer perception algorithm were used for predict human resource allocation 
for different business processes [14]. The resource allocation permits organizations achieve organizational 
goals to improve cost, time or quality [7]. 

One of the most important problems in the chemical materials control area in SUNAT is determine 
the right quantity of workers required for the three activities: visual inspection, document inspection and inside 
inspection. This activities are needed to determine if the vehicles bring some of the prohibited materials inside, 
this prohibited materials are use for drug elaboration, when the traffic of vehicles is high a large row of cars 
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are accumulate, cause discomfort among drivers in this case the workers are not enough and sometimes let the 
vehicles pass without the correct supervision, but some days there is not so much vehicle traffic and workers 
are free of their labors. That is the reason for optimize the allocation of workers taking into account historical 
data of the interventions registers. The proposed solution is to implement an ML algorithm (linear regression), 
evaluating the data and determine the correct model for optimize the numbers of workers required for each 
weekday in the controls points. This investigation has a great relevance in SUNAT and in the public institutions 
of Pert not only for optimize the resources allocations also to demonstrate the implementation of data solutions 
are important to automate and improve proceses in public institutions and bring a better service to the society. 

This investigation required data recollection and the parameters required in this data for the linear 
regression machine learning model are the day of intervention, the name of the day, the number of interventions 
realized, the number of workers assigned, the number of police officer assigned and the quantity of material 
confiscated in kilos. the measured data for this work is obtained in the control office of Asia (south of Lima) 
from January to June 2022. The number of row recollected were fifty thousand records that represent the 
number of interventions realizes during these period. 

For the forecasting and the model implementation the data were split in two, 70 percent for training 
data and 30 percent for test data [15], the training data was used to define the model and determine the coef- 
ficients and variables for the linear regression algorithm, after the model is training the test data was used to 
evaluate the model and compare results, some metrics were calculates like R-squared (R2) and mean squared 
error (MSE) and compare with the results of others models. The tool that help with the data analysis was 
python with the libraries numpy, sklearn, as the two more important libraries for this process, the integrated de- 
velopment environment (IDE) was visual studio code. The methodology used for the development of research 
is detailed below. 


2. METHOD 
2.1. Linear regression algorithm 

In this research, linear regression algorithm is used to predict the quantity of workers required to 
accomplish the supervision of tasks in chemical materials control offices in SUNAT. The regression analysis 
is a technique for modeling the relationship between variables used for engineering, physical and chemical 
issues and others. the regression analysis si the most widely used technique for statistical and predictions. 
The application of this model include: data description, parameter estimation, prediction and estimation and 
control [16]. The linear regression consist in analyze dependent and independent variables and the relationship 
between them that can represent as a line that cross the recollected data. The simplest linear regression model is 
the determine by the mathematical expression Y = a+ 5X +e where a, / is the regression coefficient, x is the 
independent variable and € the dependent variable [17]. Figure 1 shows the full pipeline for the prediction of 
the quantity of workers required to realize the activities in the the chemical materials control points in SUNAT. 
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Figure 1. Pipeline for the linear regression prediction 


Step 1: SUNAT has 10 checkpoints places distribute in Lima, Cusco, and Vraem. The information 
used for this analysis were registered on Asia checkpoints only. For collect the data an Excel sheet with visual 
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basic interface were used. The detailed information is on Table 1, the data contain fifty thousand rows from 
January to June 2022. 


Table 1. Parameters 


Variable Description 
Intervention date The day of the intervention was performed 
Name of week the name of the seven days week 
Number of interventions The quantity of interventions per day 
Number workers Number of workers assigns that day 
Number police officers Number of police officers assigns that day 
Material confiscated The kilos of confiscated material found that day 


Step 2: extract, transform and load process (ETL), this process is useful for prepare the data before 
the use in the analysis. The first phase is extraction get the clean data for different origins, the second phase is 
transformation the validation is performed to correct data errors and the last phase is the load, the new clean 
and validated data in a integrated repository [18]. For the implementation and use of the ETL process python 
was used and the data analysis was made using Python that is a free tool commonly used for data analysis and 
predictions [19]. 

Step 3: data base repository, the obtained data for the ETL process needs to be storage in a repository 
or database. The database selected for this analysis was SQlite, this tool is the less complicated transactional 
database to used in projects but keep the most important benefits of others transactional database like durability 
and faster performing [20]. The data was split and storage in three tables, one for the chemical materials 
allowed and don’t during the inspections, another one for register the data of human resource participation in 
inspections and the last one to storage the detailed information of the inspections. 

Step 4: data analysis and algorithm selection, in this step the data is slip in train data and test data, 
for determinate the correct coefficients of the linear regression algorithm. First the relationship between the 
independent variables and the dependent variable were analyzed, this case is a supervised learning algorithm 
[21], [22] because the dependent variable to predict is identified (number of workers), for determine the rela- 
tionship with others variables. The variables name of week and numbers of interventions are the ones who had 
the higher correlation (+-.77) [23]. After determine the variables needed for the model a sklearn library [24] 
was used in python to train the model with this variables as a result the following mathematical model were 
obtained (1). 


Numberofworkers = —0.61Numberofday + 0.21Numberofinterventions + 6.89 (1) 


This result were made using the train data, for validate the results the test data were used comparing the calcu- 
late results with the real results, and MSE metric were calculated [25]. But also the logistic regression algorithm 
[26] and the support vector machine algorithm (SVM) [27] were evaluated to compare which algorithm is more 
accuracy to predict the number of workers. The linear regression algorithm has the lowers errors (Table 2). 


Table 2. MSE comparison 


Algorithm MSE 
Linear regression 0.434 
Logistic regression 0.5.43 


Support vector machine 0.65 


Steps 5 and 6: in this two finals steps the final reports are elaborated and presented to the organization. 
The results included predictions for the next three months with the optimized number of human resource re- 
quired for the Asia checkpoint. Also a report with the detailed steps to update the data and de model to improve 
the model. 

In (1) the full mathematical equation is shown. This equation determines the forecast values (number 
of workers). The representation of this equation is on Figure 2. In Figure 2 the line between the data is drawn 
and show the regression for the predicted values. 
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Figure 2. Graph of linear regression 


3. RESULTS AND DISCUSSION 
3.1. Results of the prediction 

As a final step with the linear regression model validated new data is evaluated with the model to 
predict the optimized number of workers required per day of the week. The results are shown in Table 3. 
The first days of the week required more quantity of human resource on average these days seven persons are 
required and during the weekends the number of human resource is decrease in average 4 persons are required. 


Table 3. Result of the linear regression 
Day Number of workers 
Monday 7 
Tuesday 
Wednesday 
Thursday 
Friday 
Saturday 
Sunday 


WEA D A ow 


3.2. Discussion 
The data in Table 3 indicate that during the weekends there is not much traffic of vehicles to control, 


therefore, so is possible reduce the workers required or assign them to other activities like a mobile control team 
to control vehicles in rotative locations. Due to the constant temperature changes during the weather seasons in 
Pert, vehicle traffic could change, so it is recommended to recalculate the algorithm every three months, which 
is the number of months that each weather season lasts. To obtain more benefits the six step of this analysis 
has to be replicated in others checkpoints places in Lima, Cusco, and Vraem and the recalculation time of the 


algorithm has to be according the weather of the checkpoints. 


4. CONCLUSION 
In this study the number of workers required for the activities of control were predicted. The data 


collected are from the last six months at the Asian checkpoint south of Lima. With this new human resource 
allocation was possible to optimize the performance of each worker by making them not only support in inter- 


ventions but also in administrative and management work. 
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